Overview

Dataset statistics

Number of variables15
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory493.7 KiB
Average record size in memory505.6 B

Variable types

NUM6
CAT6
BOOL3

Reproduction

Analysis started2020-06-22 14:23:45.483514
Analysis finished2020-06-22 14:23:54.513437
Duration9.03 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
NumCompaniesWorked has 130 (13.0%) zeros Zeros
TrainingTimesLastYear has 34 (3.4%) zeros Zeros

Variables

Attrition
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
0
843
1
 
157
ValueCountFrequency (%) 
0 843 84.3%
 
1 157 15.7%
 

BusinessTravel
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Travel_Rarely
709
Travel_Frequently
199
Non-Travel
 
92
ValueCountFrequency (%) 
Travel_Rarely 709 70.9%
 
Travel_Frequently 199 19.9%
 
Non-Travel 92 9.2%
 

Length

Max length17
Mean length13.52
Min length10
ValueCountFrequency (%) 
Lowercase_Letter 11 64.7%
 
Uppercase_Letter 4 23.5%
 
Dash_Punctuation 1 5.9%
 
Connector_Punctuation 1 5.9%
 
ValueCountFrequency (%) 
Latin 15 88.2%
 
Common 2 11.8%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

DistanceFromHome
Real number (ℝ≥0)

Distinct count29
Unique (%)2.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.145
Minimum1
Maximum29
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median7
Q313
95-th percentile26
Maximum29
Range28
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.120955912
Coefficient of variation (CV)0.8880214229
Kurtosis-0.1030362226
Mean9.145
Median Absolute Deviation (MAD)5
Skewness1.008336875
Sum9145
Variance65.94992492
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 143 14.3%
 
1 142 14.2%
 
9 65 6.5%
 
7 62 6.2%
 
10 58 5.8%
 
3 52 5.2%
 
8 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
6 44 4.4%
 
Other values (19) 293 29.3%
 
ValueCountFrequency (%) 
1 142 14.2%
 
2 143 14.3%
 
3 52 5.2%
 
4 45 4.5%
 
5 44 4.4%
 
ValueCountFrequency (%) 
29 21 2.1%
 
28 17 1.7%
 
27 9 0.9%
 
26 18 1.8%
 
25 17 1.7%
 

EducationField
Categorical

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Life Sciences
403
Medical
333
Marketing
105
Technical Degree
82
Other
 
57
ValueCountFrequency (%) 
Life Sciences 403 40.3%
 
Medical 333 33.3%
 
Marketing 105 10.5%
 
Technical Degree 82 8.2%
 
Other 57 5.7%
 
Human Resources 20 2.0%
 

Length

Max length16
Mean length10.412
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 17 65.4%
 
Uppercase_Letter 8 30.8%
 
Space_Separator 1 3.8%
 
ValueCountFrequency (%) 
Latin 25 96.2%
 
Common 1 3.8%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 
Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
4
308
3
304
2
196
1
192
ValueCountFrequency (%) 
4 308 30.8%
 
3 304 30.4%
 
2 196 19.6%
 
1 192 19.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

JobInvolvement
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
3
593
2
259
4
 
94
1
 
54
ValueCountFrequency (%) 
3 593 59.3%
 
2 259 25.9%
 
4 94 9.4%
 
1 54 5.4%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

JobSatisfaction
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
3
321
4
306
1
188
2
185
ValueCountFrequency (%) 
3 321 32.1%
 
4 306 30.6%
 
1 188 18.8%
 
2 185 18.5%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

MaritalStatus
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
Married
469
Single
314
Divorced
217
ValueCountFrequency (%) 
Married 469 46.9%
 
Single 314 31.4%
 
Divorced 217 21.7%
 

Length

Max length8
Mean length6.903
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 11 78.6%
 
Uppercase_Letter 3 21.4%
 
ValueCountFrequency (%) 
Latin 14 100.0%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

MonthlyIncome
Real number (ℝ≥0)

Distinct count941
Unique (%)94.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6464.418
Minimum1009
Maximum19999
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1009
5-th percentile2120.7
Q12874
median4877.5
Q38393
95-th percentile17660.3
Maximum19999
Range18990
Interquartile range (IQR)5519

Descriptive statistics

Standard deviation4685.919516
Coefficient of variation (CV)0.7248787927
Kurtosis1.031928028
Mean6464.418
Median Absolute Deviation (MAD)2174.5
Skewness1.374173983
Sum6464418
Variance21957841.71
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5562 3 0.3%
 
2380 3 0.3%
 
2342 3 0.3%
 
2404 3 0.3%
 
2610 3 0.3%
 
2451 3 0.3%
 
3452 2 0.2%
 
17861 2 0.2%
 
2269 2 0.2%
 
2720 2 0.2%
 
Other values (931) 974 97.4%
 
ValueCountFrequency (%) 
1009 1 0.1%
 
1051 1 0.1%
 
1052 1 0.1%
 
1081 1 0.1%
 
1118 1 0.1%
 
ValueCountFrequency (%) 
19999 1 0.1%
 
19973 1 0.1%
 
19926 1 0.1%
 
19859 1 0.1%
 
19847 1 0.1%
 

NumCompaniesWorked
Real number (ℝ≥0)

ZEROS
Distinct count10
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.704
Minimum0
Maximum9
Zeros130
Zeros (%)13.0%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile8
Maximum9
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.490499265
Coefficient of variation (CV)0.9210426274
Kurtosis0.02551822966
Mean2.704
Median Absolute Deviation (MAD)1
Skewness1.030068887
Sum2704
Variance6.202586587
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1 351 35.1%
 
0 130 13.0%
 
3 114 11.4%
 
2 104 10.4%
 
4 94 9.4%
 
7 50 5.0%
 
6 50 5.0%
 
5 38 3.8%
 
9 35 3.5%
 
8 34 3.4%
 
ValueCountFrequency (%) 
0 130 13.0%
 
1 351 35.1%
 
2 104 10.4%
 
3 114 11.4%
 
4 94 9.4%
 
ValueCountFrequency (%) 
9 35 3.5%
 
8 34 3.4%
 
7 50 5.0%
 
6 50 5.0%
 
5 38 3.8%
 

OverTime
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
No
716
Yes
284
ValueCountFrequency (%) 
No 716 71.6%
 
Yes 284 28.4%
 

TrainingTimesLastYear
Real number (ℝ≥0)

ZEROS
Distinct count7
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.841
Minimum0
Maximum6
Zeros34
Zeros (%)3.4%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median3
Q33
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.300542352
Coefficient of variation (CV)0.4577762592
Kurtosis0.4313947547
Mean2.841
Median Absolute Deviation (MAD)1
Skewness0.5676822032
Sum2841
Variance1.69141041
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2 362 36.2%
 
3 346 34.6%
 
5 89 8.9%
 
4 75 7.5%
 
6 48 4.8%
 
1 46 4.6%
 
0 34 3.4%
 
ValueCountFrequency (%) 
0 34 3.4%
 
1 46 4.6%
 
2 362 36.2%
 
3 346 34.6%
 
4 75 7.5%
 
ValueCountFrequency (%) 
6 48 4.8%
 
5 89 8.9%
 
4 75 7.5%
 
3 346 34.6%
 
2 362 36.2%
 

CommunicationSkill
Real number (ℝ≥0)

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.041
Minimum1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size15.6 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q34
95-th percentile5
Maximum5
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.413972531
Coefficient of variation (CV)0.4649695926
Kurtosis-1.296248835
Mean3.041
Median Absolute Deviation (MAD)1
Skewness-0.0428380009
Sum3041
Variance1.999318318
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 
ValueCountFrequency (%) 
1 193 19.3%
 
2 193 19.3%
 
3 201 20.1%
 
4 206 20.6%
 
5 207 20.7%
 
ValueCountFrequency (%) 
5 207 20.7%
 
4 206 20.6%
 
3 201 20.1%
 
2 193 19.3%
 
1 193 19.3%
 

OwnStocks
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size15.6 KiB
1
572
0
428
ValueCountFrequency (%) 
1 572 57.2%
 
0 428 42.8%
 

PropWorkLife
Real number (ℝ≥0)

Distinct count346
Unique (%)34.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.28684550215494325
Minimum0.0
Maximum0.6727272727272727
Zeros7
Zeros (%)0.7%
Memory size15.6 KiB

Quantile statistics

Minimum0
5-th percentile0.04751082251
Q10.1794871795
median0.2631578947
Q30.4
95-th percentile0.5661672216
Maximum0.6727272727
Range0.6727272727
Interquartile range (IQR)0.2205128205

Descriptive statistics

Standard deviation0.1538261788
Coefficient of variation (CV)0.5362684012
Kurtosis-0.5050614235
Mean0.2868455022
Median Absolute Deviation (MAD)0.09465881685
Skewness0.43071975
Sum286.8455022
Variance0.0236624933
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.3333333333 22 2.2%
 
0.2 21 2.1%
 
0.5 19 1.9%
 
0.25 18 1.8%
 
0.2222222222 17 1.7%
 
0.2857142857 16 1.6%
 
0.3076923077 13 1.3%
 
0.3225806452 13 1.3%
 
0.1666666667 12 1.2%
 
0.4 12 1.2%
 
Other values (336) 837 83.7%
 
ValueCountFrequency (%) 
0 7 0.7%
 
0.01960784314 1 0.1%
 
0.02222222222 1 0.1%
 
0.02631578947 1 0.1%
 
0.02857142857 6 0.6%
 
ValueCountFrequency (%) 
0.6727272727 2 0.2%
 
0.6666666667 1 0.1%
 
0.6607142857 1 0.1%
 
0.6603773585 1 0.1%
 
0.6551724138 1 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

AttritionBusinessTravelDistanceFromHomeEducationFieldEnvironmentSatisfactionJobInvolvementJobSatisfactionMaritalStatusMonthlyIncomeNumCompaniesWorkedOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLife
00Non-Travel2Medical334Single25640No2400.400000
10Travel_Rarely12Life Sciences333Married46639Yes2210.194444
21Travel_Rarely2Medical334Single51604No3500.218182
30Travel_Rarely24Life Sciences134Single41087No2400.461538
40Travel_Rarely3Other333Married94341No2110.270270
50Travel_Rarely7Life Sciences223Married23293No2200.419355
61Travel_Rarely1Life Sciences423Single37300Yes2100.125000
70Travel_Rarely4Medical122Married38388No5500.242424
80Travel_Frequently11Marketing434Divorced49681No3410.142857
91Travel_Rarely7Marketing232Single26791No3500.047619

Last rows

AttritionBusinessTravelDistanceFromHomeEducationFieldEnvironmentSatisfactionJobInvolvementJobSatisfactionMaritalStatusMonthlyIncomeNumCompaniesWorkedOverTimeTrainingTimesLastYearCommunicationSkillOwnStocksPropWorkLife
9900Travel_Frequently6Life Sciences141Married55623Yes3510.250000
9910Travel_Frequently10Medical433Divorced38151Yes4110.147059
9920Travel_Rarely1Medical344Divorced96130No5310.487179
9931Travel_Frequently9Life Sciences313Married129367No3300.531915
9940Travel_Rarely7Life Sciences331Married99858No1110.232558
9950Non-Travel10Medical234Single99801No3400.277778
9960Travel_Rarely16Life Sciences334Single79456Yes2200.450000
9971Travel_Rarely9Medical324Single96191No3400.195652
9980Travel_Rarely2Medical324Single68775Yes4500.400000
9990Travel_Frequently2Marketing322Married75252No2310.566038